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1.  Introduction. 


This  paper  is  a  sequel  to  an  earlier  one  (Hartley  and  Rao,  1968)  on  the 
same  topic.  Accordingly,  it  will  be  necessary  to  briefly  recall  the  basic  re¬ 
sults  of  the  earlier  paper  and  relate  that  paper  to  the  present  one.  Our  first 
paper  was  predominantly  concerned  with  simple  random  sampling  (with  or  without 
replacement)  from  a  finite  population.  In  the  present  paper  we  are  concerned 
with  examining  the  relation  of  our  findings  to  the  more  complex  sampling  pro¬ 
cedures  such  as  unequal  probability  sampling  as  well  as  stratified  and  multi¬ 
stage  sampling. 

The  basic  feature  of  our  theory  was  a  special  parametrisation  of  a  finite 
population  of  N  units  with  k  characteristics  attached  to  each  unit.  Denote  by 
the  k- vector  attached  to  the  i-th  unit.  Vie  assume  that  all  elements  of  the 
y.  are  measured  on  discrete  scales  ar.d  that  only  a  finite  set  of  T  measurement 
vectors  y^.  (t  =  1,  2, .  ,,t)  are  possible  for  the  y^.  Denote  then  by 

=  no,  of  units  in  the  population  having  y^  (l) 


satisfying  the  conditions 


N  >  0  and 

U 


T 

E  N. 
t=l 


=  N. 


(2) 


Henceforth,  sums  and  products  for  t  are  over  1,  2,..,T. 

The  parameters  completely  describe  any  finite  population.  The 

number  T  is  usually  large  although  sometimes  occasions  arise  when  T  is  small 
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or  moderate  and  the  estimation  of  the  N^_  is  of  intrinsic  interest,  as  for 
example  when  the  represent  a  frequency  distribution  such  as  the  number  of 
households  in  the  community  comprising  t  persons.  However,  in  most  cases  -we 
shall  be  concerned  with  the  estimation  of  a  few  simple  parametric  functions 
such  as  the  population  moments  and  not  with  the  separate  estimation  of  the 
excessively  large  number  of  parameters  N, . 

Finite  population  sampling  will  normally  consist  of  (a)  the  sample  design, 
i.e.,  the  procedure  of  drawing  a  sample  of  n  distinct  units  (where  n  may  be 
fixed  or  random)  and  with  measuring  the  y,.  for  these  units,  (b)  the  use  of 
the  measured  y^  to  compute  estimators  of  the  population  parameters. 

In  our  previous  paper  we  restricted  (a)  to  simple  random  sampling  and  we 
confined  the  computation  of  estimators  (b)  to  what  we  termed  ’scale-load' 
estimators.  These  were  defined  as  mathematical  functions  of  the  scale  vectors 
y^  and  of  their  sample  loads  (frequencies)  n^  =  no.  of  units  in  the  sample 
having  y^.  Thus  any  identifying  labels,  i,  that  may  be  attached  to  the  units 
may  or  may  not  be  used  for  the  implementation  of  the  sample  design;  however, 
labels  are  not  directly  used  in  the  computation  of  the  estimators,  neverthe¬ 
less,  in  situations  where  the  labels,  i,  are  observable  characteristics  of  the 
units  and  are  considered  informative  observables,  the  labels  may  be  adjoined 
to  the  vectors  y^  as  a  (k  +  l)-th  component. 

We  were  able  to  show  that  within  the  class  of  ’scale-load’  estimators 

many  of  the  estimators  in  current  use  possess  interesting  optimality  properties 

(unbiased  minimum  vari- 

in  simple  random  sampling.  Specifically  the  estimators  are  either  UMV  or  ance 
maximum  likelihood  estimators  or  both.  Some  of  these  results  ere  briefly  re¬ 
stated  in  Section  2,  In  the  remaining  sections  of  the  present  paper  we  are 
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concerned  with  the  role  these  results  play  in  the  more  c duplex  sampling  proce¬ 
dures.  Briefly  our  findings  are:  (l)  Hie  above  parametrization  of  finite 
populations  will  continue  to  yield  useful  likelihood  formulations  for  sampling 
designs  providing  maximum  likelihood  and  Bayesian  estimation  procedures.  UMV 
property  will  be  the  exception  rather  than  the  rule. (2)  We  consider  that  iden¬ 
tifying  labels  of  primary  units  (or  all  but  the  last  stage  units)  will  often 
be  available  as  well  as  informative.  There  are,  however,  situations  in  which 
higher  stage  units  are  not  labelled  as  is  the  case,  for  example,  for  certain 
subsets  of  machine  parts  produced  in  bulk,  the  water  supply  of  water  works 
produced  during  certain  time  periods,  etc.  Certain  situations  where  labels  of 
higher  stage  units  are  not  informative  also  exist,  for  example  identifiable 
subsets  of  certain  lists.  Both  'scale-load'  and  label-dependent  estimators 
are  therefore  required.  As  would  be  expected,  there  is  usually  no  UMV  estimator 
in  the  class  of  label-dependent  estimators.  (3)  A  particular  problem  arises 
when  label  dependence  of  estimators  is  used  in  conjunction  with  Bayesian  concepts 
and  separate  prior  distributions  are  allowed  for  the  individually  identifiable 
units.  The  resulting  posterior  distributions  and  hence  Bayesian  inferences  do 
not  depend  on  the  survey  design  which  in  the  frame  work  of  Bayesian  theory  be¬ 
comes  a  randomization  procedure  irrelevant  in  making  posterior  inferences. 
However,  the  absurd  result  that  Bayesian  theory  leads  to  when  applied  to  simple 
sampling  or  ultimate- stage  unit  sampling  (Godambe,  1966)  is  perhaps  our  strongest 
point  in  favor  of  examining  estimators  that  do  not  depend  on  the  labels  of  the 
ultimate- stage  units. 

2.  Simple  random  sampling. 


If  a  simple  random  sample  of  fixed  size  n  is  drawn  without  replacement 


-U- 


from  the  population  of  N  units,  the  likelihood  of  the  n^  is  given  by 


where  n^  >  0  and  £n^_  =  n.  We  confine  ourselves  here  to  the  case  of  a  single 
character  y  attached  to  the  units  (i.e.,  k  =  l).  In  our  previous  paper  we  have 
shown  that  any  function  of  the  n^  is  an  UMV  estimator  of  its  expectation. 
Specifically  some  of  the  more  important  parametric  functions  and  their  UMV  esti¬ 
mators  are  given  below: 

Parametric  function  UMV  estimator 


V 

nt/n 

k;  = 

J- 

it 

/ 

00 

o2  =  v.'z  -  u{2 

n(N-l)  ,  ,  t2^ 

— - L  (ml  -  m'  ) 

N(n-l)  2  1 

Notice  that  the  estimators  do  not  depend  on  T  or  the  non-observed  y^.  When  N/n 
is  an  integer,  n./n  and  m'  are  also  the  maximum  likelihood  estimators  (see  the 
Appendix),  When  N/n  is  not  integral,  the  maximization  of  (3)  over  the  integral 
grid  N  can  be  achieved  by  the  algorithm  given  in  the  Appendix;  however,  since 
UMV  estimators  exist,  the  maximum  likelihood  estimators  may  not  have  particular 
merit  for  small  samples.  The  possibility  of  using  maximum  likelihood  estimators 
of  the  when  T  is  small  and  the  are  parameters  of  interest  is  being  exam¬ 
ined  by  a  Monte  Carxo  study. 

Turning  now  to  Bayesian  estimation,  we  have  used  in  our  previous  paper 
the  mathematically  convenient  prior  distribution  suggested  by  Hoadley  (1968) 
and  given  by 


cp(N^,...,Nj) 


n 


<”t  ♦  't  ■  x>! 

v(vt  - 1>: 


v°- 


(5) 


The  'Bayes  estimator'  of  u'  +»,* 

oi  jir  is  the  posterior  expectation 


where 


of  and.  is  gi 


given  by 


E'(l&  -  a  -  f)iw  +  a .  V)h^j  +  £  ^ 


w  =  n/(n  +  v),  V  =  2% 


•  (8) 

It  ‘tould  be  rarted  that  the  estimator  (6)  only  requires  the  knowledge  of 

M;  <the  Pri°r  °f  “r>  -  ».  i-...  1.  the  ease  of  ,  .  X  the  knowledge  only 

of  the  prior  mean  *J  and  the  relative  weight  w  of  the  staple  and  prior  infor¬ 
mation,  Moreover,  although  the  are  ahi„  to  a  prior  sa^le  frequencies,  the 
posterior  mean  is  not  simply  the  mean  of  the  uroled  -sample-  vt  ♦  n.  »  duly 

gnizes  the  fact  that,  as  n  -  N,  the  sample  mean  kj  will  tend  to  p’  and 
that  the  prior  is  ignored. 

The  expected  loss  which  the  decision  maker  faces  by  chosing  the  -Bayes 
estimator'  is  given  by  the  posterior  variance 

V'(ll')  -  (N+v)  r  |  r  vO-, 

r  B  Wl)  K-  +  ^“sr  -  K  *  (!-»>«;}  ]  .  (9) 

The  'Bayes  estimator'  of  a2  is  given  by 

eV)  ■  S?  v'(^  +  &  •  1*  +  •  (10) 

It  should  be  noted  that,  if  the  prior  information  is  solely  based  on  a  pilot 

sample,  and  »  would  rougdy  represent  the  r-th  sa^le  moment  based  on  the 
pilot  sample  and  the  pilot  sample  size  respectively. 

drains  to  simple  random  sampling  with  replacement,  suppose  a  random  sample 
of  fixed  size  m  is  drawn  with  equal  probability  and  with  replacement.  Let- 
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n  denote  the  number  of  distinct  units  in  the  sample  and  n^  the  number  of  distinct 
units  having  the  value  y,  in  the  sample.  The  total  likelihood  is  given  by 

<0 

Lfa,...,]^)  =  P(n)  (11) 


where  the  pr<bability  P(n)  is  a  function  only  of  m  and  N.  For  this  sample  de¬ 
sign  no  IS4 V  exists,  but  the  maximum  likelihood  estimator  of  p.*  is  m'  =  nhx,  yf 

r  r  z  z 

provided  N  =  cx  least  common  multiple  of  1,  (c  =  integer).  In  particular, 

the  maximum  likelihood  estimator  of  the  population  mean  p|  is  the  sample  mean 
based  only  on  the  distinct  units  in  the  sample  and  it  is  uniformly  more  efficient 
than  the  customary  sample  mean  based  on  all  the  sample  draws.  With  the  prior 
distribution  (5),  the  'Bayes  estimator'  of  p^,  ,  the  posterior  variance  of  p/ 
and  the  'Bayes  estimator'  of  cr  are  respectively  given  by  (6),  (9)  and  (10) , 
where  n  and  the  n  are  as  defined  above. 

u 

3.  Estimation  with  concomitant  variables. 


In  our  earlier  paper  we  have  considered  a  situation  customarily  dealt  with 
by  ratio  or  regression  method  of  estimation  in  which  two  variates  y  and  x  are 
attached  to  each  of  the  units  and  the  population  mean  Y  of  'target  variate'  y 
is  to  be  estimated  utilizing  the  available  information  about  x.  Assuming  that 
only  the  population  X  of  x  is  known,  we  have  shown  that  an  approximation  to  the 
maximum  likelihood  estimator  of  Y  is  closely  related  to  the  customary  regression 
estimator,  provided  the  sample  size  n  is  moderately  large.  In  this  section  we 


extend  this  result  to  multiple  concomitant  variables  x^,...,x^,  assuming  that 
only  the  population  means  5^, . . .  ,Xk  are  known.  We  show  that,  for  moderately 


large  n,  an  approximation  to  the  maximum  likelihood  estimator  of  Y  is  closely 
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related  to  the  customary  multiple  regression  estimator. 

As  before,  we  assume  that  a  finite  set  of  T  distinct,  known  values  are 
feasible  for  y  Likewise,  we  assume  that  I.  distinct,  known  values  x. .  are 

feasible  for  x .  (  j  =  1, . . . ,k) .  Let  N.  .  .  denote  the  number  of  units  in  the 

3  v*v 

population  which  have  x  x,  .  and  y  attached  to  them.  Let  n.  .  .  be 

111  t  V*V 

the  number  of  units  in  the  simple  random  sample  of  size  n  (drawn  without  replace¬ 
ment)  which  have  x  .  x,  .  and  y  attached  to  them. 

ti-jj •  •  • ,  t 

We  consider  only  the  multinomial  situation  in  which  N  —  o  and 
N.  .  ./N-  P.  .  while  n  is  held  fixed.  The  likelihood  L  is  then  given 

1**V  V*V 

by  the  multinomial  distribution  with  probabilities  P.  ...  The  restrictions 

V*V 

on  the  P.  .  .  are  given  by 

v-v 

P  >  0,  P’i  =  1  and  P'Z  =  X  (12) 


where  P*  is  the  nxl  vector  of  the  P.  .  . ,  i  is  the  lxn  vector  of  l's, 

:  .  v-v  • 

X  =  (X^, ...  ,X^)  and  Z  =  (xj|..|x£)  where  p'x*  =  X_.  (j  =  l,...,k).  As  in  our 

previous  paper,  it  can  be  shown  that  for  moderate  sample  sizes  n  the  global 

maximum  of  the  multinomial  likelihood  can  only  be  attained  if  P.  .  ,  -  0  for 

1l*,ikt 

all  those  variate  combinations  for  which  n.  .  ,  =  0,  and  P.  .  .  >  0  for 

Vi‘  v-v 

the  remainder.  Confining  then  the  maximization  to  the  latter  P.  .  .  only 

V’V 

and  introducing  the  Lagrangian  mul  tipliers  X  and  p.  =  (p,, , . . .  ,u  ) ,  the  maxi- 
mization  of  log  L  subject  to  (12)  is  attained  for  P  =  P  where 

r  i k  -  i-i 

F  =  — ± - ! 1  +  i  E  I x,(x.,  -  X.)  . 

*2_**^jj  n  *-  o  i  0  tJi.j  " 


(lo) 


Expanding  P  i  =  1  to  first  three  terms  we  obtain 


n(x  -  X)V  =  pX**X*n 


ldiere  x'  =  (x  ,...,x.  )  is  the  vector  of  sample  means  and  X*'X*  =  S*  =  (s*  ) 
~  1  K  ~  ~  ~  ,1P 


where 


s*  =  n“  £..En,  .  .(x..  -  X.)(x  .  -  X  ). 

3p  yV  jij  y  pip  p' 


It  is  readily  seen  that  the  solution  of  (lU)  is  given  by 

p  =  n(X**X*)"1(x  -  X). 


Now  using  (15)  and  expanding  (13)  to  the  first  two  terms  we  get 

P  =  i  [~n  +  X+(X*,X*)“1(X  -  x)l  (16) 

where  n  is  the  lxn  vector  of  the  n.  .  .  and  X+  is  given  by  y'X+  =  (s*  ,...s*  ) 
~  1^  •  •  ~  ~  xy  Ky 

where  p'y  =  Y  and 

sJy  -  ^  =  i—X- 

An  improved  approximation,  along  the  lines  of  our  previous  paper,  can  be  ob¬ 
tained  by  expanding  (13)  to  the  first  three  terms. 

Using  (l6),  an  approximation  to  the  maximum  likelihood  estimator  of  the 
population  mean  Y  =  P'y  is  given  by 

Y  =  P’y  =  y  +  (X  -  x),S*"1s*  (17) 

where  s *  is  the  k-vector  of  the  s*^.  The  customary  multiple  regression 


estimator  is  given  by 


1  =  y  +  (x  -  x),S”1s 

r  y 


where  S  =  (s.  ),  s  is  the  k-vector  of  the  s.  and 

~  v  op'*  ~.y  ay 


-9- 


s.  =  s*  -  y(x.  -  X.). 

oy  jy  v  o  o 

Although  (17)  differs  slightly  from  (l8),  the  above  development  clearlj  shows 
that,  at  least  in  large  samples,  the  customary  multiple  regression  estimator 
is  essentially  the  maximum  likelihood  estimator. 

*4 .  Stratified  simple  random  sampling  without  replacement. 

4.1  UMV  Estimator. 

Suppose  there  are  L  strata  in  the  population  with  IT  units  in  the  i^ 

stratum  (i  =  1,..,L).  Denote  by  N. .  the  number  of  units  in  the  population 

it 

tlx 

belonging  to  the  i  stratum  and  having  the  measurement  y^(t  =  1, . .  ,T^)  so 

that  =  N^(l33J^t  =  K).  A  stratified  simple  random  sample  (n^,  ••»nL)  is 

t 

drawn  without  replacement,  (En..  =  n),  and  n.^  denotes  the  number  of  units  in 

til 

the  sample  belonging  to  the  i  1  stratum  and  having  the  measurement  y^, 

(Enit  =  n^) .  Now  the  likelihood  of  the  n^  is  given  by 
t 


L  cShft) 

L(NU,..,N  )  =  n  [  -  -1-.  ]. 

L  1  C) 


Therefore,  the  n..  are  complete  sufficient  for  the  N..  and,  hence,  the  UMV 

it  it 

estimator  of  the  population  mean  Y  =  N  zEN^y^t  is  "the  customary  estimator 

Y  =  N'1EXN.tyt  =  N'-^N.y  (20) 

it  1  i 

where  N.x  =  (N./n,  )n. ,  is  the  UMV  estimator  of  N.x.  It  also  follows  that  the 
it  r  i'  it  it 

»  A 

maximum  likelihood  estimators  of  the  and  Y  are  the  UMV  estimators  N^  and 
Y  respectively,  when  the  are  integral.  Notice  that  each  stratum  is 

described  by  its  separate  set  of  parameters,  i.e.,  we  have  an  additional  sub¬ 
script  i  to  index  the  strata. 
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An  interesting  special  case  occurs  when  the  stratification  is  according 

to  the  size  of  the  units,  say  x^.  If  we  assume  that  is  constant  within 

strata  and  use  the  allocation  proportional  to  total  size,  i.e., 

n.  =  n(N.x./£N.x. )  =  N.P.  (say)  where  Z£P.  =  n,  we  get 
i  11  i  i  xx  it1 


it  it  1 


which  is  a  ’Horvitz-Thompson'  type  estimator. 


(21) 


4.2.  Bayesian  optimization  of  stratified  sampling. 


Ericson  (1965)  has  presented  a  solution  to  the  problem  of  optimum  allo¬ 
cation  when  prior  information  in  the  form  of  a  prior  distribution  is  available. 

He  has,  however,  assumed;  (a)  11  *  ®,  i  *  1,..,L,  (b)  normality  of  the  within 

2 

stratum  populations  and  (o)  known  within  stratum  population  variances  cj^. 
Assuming  that  the  within  stratum  population  means  have  independent  normal 
priors  with  means  rru  and  variances  v!^ ,  he  has  shown  that  the  posterior  variance 
of  the  population  mean  <i  =  E  is  given  by 


It 

V 


(22) 


l 

where  n  is  the  known  proportion  of  the  population  units  falling  within  the 
i/ti 

i  stratum.  Ericson  has  given  a  computational  algorithm  to  find  n^  >  0 
(i  =  1,..,L)  such  that  (22)  is  minimized  subject  to  the  cost  constraint 


Ec.n.  =  C  (23) 

where  C  is  the  given  budget. 

Recently,  Draper  and  Guttman  (1968)  have  relaxed  the  assumption  (c)  and 
presented  a  sequential  allocation  scheme  whicW®^aisimpler  than  Ericson' s  al- 


* 


gorithm.  They  have  also  considered  the  case  of  unknown  proportions  Using 
our  present  approach,  one  of  us  (j.  N.  K.  Rao,  1968)  has  given  a  solution  which 
is  free  from  the  restrictive  assumptions  (b)  and  (c).  Extension  to  multiple 
priors  and/or  multiple  characteristics  by  the  use  of  convex  programming  was 
also  considered.  In  this  section  we  present  a  complete  solution  by  relaxing 
the  assumption  (a)  also. 

We  assume  that  prior  information  on  the  HL  ^  is  available  in  the  form  of 
(5)  for  each  i  and  that  the  priors  are  independent.  Therefore,  the  prior  dis¬ 
tribution  of  N^,  •  •  jNlt  is 

L 

S  (Nit  +  vit  “  1>I 
<p(n  ,..,n_  ),t.  — — - 

11  XTL  iS  Nit!(vit-l)i  (24) 

Vit>0’fit  =  Vi  • 

How,  since  Y  =  N  ^IJhY^  where  Y^  is  the  i  ^  stratum  population  mean,  we  get 


using  (6)  and  (9)  the  posterior  mean  of  Y  as 


E-  (i)  =  [(i  -  £)  r  -4^^  yit +  r  ?  ¥  yit] 


it  i 


it  i 


and  the  posterior  variance  of  Y  as 


V'(Y)  =  H-2E  »2(l  -  ^)(l  +  ^)(nt  +  v±  .  I)-1. 


r_  ni.t  *  vit  2  U  “it  +  vit  \  1 

S  "i  +  vi  “  's  ni  +  vi 


n. .  +  v. 


Since  the  posterior  variance  (26)  depends  on  the  to  be  observed  sample  values 
nit,  we  take  the  expectation  of  (26)  with  respect  to  the  marginal  distribution 
of  the  n^.  It  follows  from  Hoadley  (1968)  that  the  marginal  distribution  of 
the  n^  is  given  by 
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,n..  +  v  -  l 

„/  n  it  \ 

i 


(27) 


\  /  ^ .  v  **  . 

r(5)  =r-|(i-  ^)(i  ♦  £) 


(28) 


which  is  identical  to  that  in  the  case  of  infinite  populations  with  Dirichlet 
prior  distributions.  Therefore,  using  the  results  of  J.  N.  K.  Rao  (1968)  it 
follows  from  (26)  that  the  expected  posterior  variance  of  Y  is 

i  l1  -  rX1  +  sr) /' 

:l  N  i 

where 

v. 

i  t  i 

It  follows ,  using  (9)  and  (10),  that 

Prior  variance  of  Y.  =  (—  +  A. 

i  \v.  N ./  l 


J  -  K +  v1*  [ht  -  (?  -£  **)'. 


v. 

't  i 


t-.  2 


(29) 


and 


Prior  mean  of  S.  =  A. 

l  l 


(30) 

(31) 


where  N.o2  =  (n.  -  l)S?. 

11  1  1 

Now  (28)  is  a  separable  convex  function  in  the  n^  and,  therefore,  the 
values  n^  which  minimize  (28)  subject  to  (23)  and  0  <  n^  ^  (i=l,..,L)  can  be 

obtained  by  convex  programming*  It  is  also  possible  to  develop  a  sequential 
allocation  procedure  analogous  to  that  of  Draper  and  Guttman  (1968). 

It  is  important  to  note  that  the  knowledge  of  the  complete  priors  is  not 

2 

essential  for  the  optimum  allocation  —  only  that  of  the  prior  mean  of  and 
prior  variance  of  |i^  is  needed.  If  the  priors  are  solely  based  on  pilot  samples 
within  each  stratum,  then  [(v^+l)/v^ ]A^  and  would  roughly  represent  the 
pilot-sample  nriance  and  the  pilot  sample  size  respectively. 

The  extension  of  the  above  results  to  multiple  priors  and/or  multiple 


In  our  o*  ginal  version  we  ignored  the  restriction  n.  <  N .  and  Ericson  has 
pointed  is  out.  1 
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characteristics  follows  along  the  lines  of  J.  N.  K.  Rao  (1968)  and  the 
optimum  allocation  is  obtained  by  convex  programming. 

5.  Single-stage  unequal  probability  sampling. 

In  the  preceding  sections  we  have  been  mainly  concerned  with  sampling 
procedures  in  which  all  the  units  had  an  equal  chance  of  selection.  The 
only  exception  is  stratified  sampling  (Section  *1.1)  in  which  strata  allo¬ 
cations  n^  proportional  to  the  products  N^x^  gave  all  the  IT  units  in  the 
i  size  stratum  an  equal  inclusion  probability  of  =  n(N^xi/EN^xi)  which 
was  varied  from  ratum  to  strattm.  While  unequal  probability  sampling  by 
'size  strata'  may  be  satisfactory  for  many  practical  purposes,  situations 
often  arise  in  which  we  desire  to  vary  the  inclusion  probability  from  unit 
to  unit.  However,  this  type  of  unequal  probability  sampling  mainly  arises 
in  the  selection  of  primary  sampling  units  in  multi-stage  sampling  which 
we  discuss  in  Section  6.  Here  we  confine  ourselves  to  the  (rare)  situations 
where  unequal  probability  sampling  is  used  in  'single-stage'  or  'ultimate- 
stage'  sampling  of  units  which  are  not  necessarily  identifiable  in  advance 
of  sampling. 

As  ai  example  of  p.p.s.  sampling  of  this  kind,  we  may  mention  here  the 
sampling  of  farm  operators  in  Iowa  counties  proportional  to  the  land  acreages 
they  operate.  If  a  county  map  can  be  covered  by  a  rectangle  with  dimensions 
Z  by  W  and  (z^w^ji  =  l,..,r(r  >  m)  denote  uniform  variables  with  0  < 

<  Z  and  0  <  V7i  <  W,  co-ordinates  (z^w^)  can  be  pinpointed  on  the  map  and 
the  interviewer  can  be  instructed  to  ascertain  (in  order  of  draw)  the  first 
m  operators  whose  land  acreages  contain  the  pinpointed  land  marks.  This 
results  in  p.p.s.  sampling  with  replacement  in  which  pi  =  x^^/X  (x_^  =  land 


t 


acreage  of  i  operator  only  known  for  sampled  operators,  X  =  total  land 

acreage  known  in  advance  of  sampling)  where  p^  denotes  the  probability  of 
til 

selection  of  i  operator  at  a  single  draw.  This  well-known  situation  of 

* multinomial  sampling’  is  the  only  one  discussed  in  this  section.  We  show 

that  it  can  be  reparametrized  in  such  a  way  that  optimality  properties  can 

be  formulated  for  certain  estimators. 

Let  r^  =  y^/p^  and  denote  by  r^_  (t  =  1,..,T)  the  set  of  T  discrete 

scale  points  feasible  for  the  r..  Let  the  score  m.  denote  the  number  of 

11 

til 

times  i  unit  is  included  in  the  sample  (i  =  1,..,N;  Xhk  =  m).  We  now 
classify  the  r^  into  the  T  groups  and  denote  by 


„  f  p.  if  for  th 

pit  =  t  1 

0  otherwise 


if  for  the  i  unit  r.  =  r, 

l  t 


j.  y_ 

if  for  the  i  unit  r.  =  r, 
i  it 


f  y.  if  for  th 

yit  i  1 

0  otherwise 


f  n.  i 

l  * 


iu 

if  for  the  i  unit  r.  =  r, 

i  t 


0  otherwise 


The  multinomial  distribution  of  scoring  m  multinomial  scores  into  IJ 
classes  with  probabilities  p^  may  then  be  written  in  the  form 


L(pH»..»%T)  ~  Hm  I.npit 

m  lt  i<t 


and  may  be  reparametrised  as  follows: 


-  *  pit 


■{ ; 


piA  ifpt>0  t 


if  pt  =  0 


Iiiim  ii  tt  mrrj~  — 


CCCo*  * 


so  that 


LVit  =  l  ifpt>0. 
1 


Writing  m^  =  Em^,  we  aay  ■^act;')r^'<ie  (32)  as 

r-  -v  r  „  - 

i  m  .  t  Q.  . 

_  /  \  Qlt  n  u  u  _  lw 

I-(p11>-**,PNT)  =  -  Pt  "  ait:  i^it  *  (35) 

t  _  .i„t  *  _ 

Equation  (35)  shows  that  the  m^  are  sufficient  for  the  p^_  since  the  latter 
are  only  involved  in  the  marginal  distribution  of  the  m^.  and  not  in  the 
conditional  distribution  of  the  dk^  given  the  m^. 

The  maximum  likelihood  estimators  of  the  are  given  by  the  ratios 
m^/m  and,  hence,  the  maximum  likelihood  estimator  of  the  population  total 


is  given  by 


T  “  ”  jvt®it  =  JVt 

1  it  t  1  t 


mt  1  1  yA 

Y  =  Er ,  —  =  -  Er  Em.  =  -  E  -i-i 
,tm  m,t.  it  m.  p. 

t  t  1  1*1 


which  is  the  customary  unbiased  estimator  of  Y  in  p.p.s.  sampling  with  re¬ 
placement  . 

Finally  it  should  be  noted  that  (35)  is  the  likelihood  for  the  scores 
which  do  not  necessarily  represent  counts  of  distincts  units  in  the  population. 
However,  it  is  possible  to  obtain  the  likelihood  of  the  number  of  distinct 
units  in  the  sample  with  scale  ratio  r  which  we  denote  by  n, .  The  distri- 
bution  of  the  is  given  by 

ppt‘  (38) 
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and  the  conditional  distribution  of  the  n.  given  the  m  can  be  obtained  in 

t  t 

terms  of  the  Y1+  from  formula  (h.3)  of  Kullback  (1937).  Finally  the  like¬ 
lihood  of  the  Dj.  can  be  obtained  by  summation  ex'  the  product  (i.e. ,  the 
joint  distribution)  over  ra,  =  n  to  m  subject  to  Em,  =  m.  Ve  intend  to 

t  t  X/ 

examine  this  distribution  in  more  detail  elsewhere. 

A.  -.hough  only  one  single  method  of  unequa..  probability  sampling  is 
examined  in  this  section  and  although  the  method  examined  is  known  not  to 
be  particularly  efficient,  the  discussion  clearly  indicates  the  possibility 
of  deriving  concrete  likelihoods  for  other  unequal  probability  sampling 
methods  with  the  help  of  our  technique  of  parametrisation. 

6.  Two-stago  sampling. 

In  order  to  simplify  the  discussion  we  confine  ourselves  to  two-stage 
sampling  in  which  the  primaries  are  selected  with  equal  or  unequal  proba¬ 
bilities.  Consider  then  a  population  consisting  of  L  primary  units 

i  =  1, . . ,L  of  which  t  will  he  sampled  and  denote  by  EL  tne  number  of 

tb 

secondary  units  in  the  i  *  primary.  Denote  by  the  number  of  units  in 
’til 

the  i  primary  which  have  the  scale  value  y.  (t  =  1,..,T.)  so  that 

It  1 

EN..  =  N. .  Let  u.  =  1  if  the  x  n  primary  is  in  the  sample  and  zero  other- 
t  it  i  x 

wise.  Denote  by  P(u^,..,u^)  the  joint  distribution  of  the  u^  corresponding 

to  the  primary  sampling  procedure  adopted  and  let  denote  the  number  of 

th 

secondary  units  to  be  drawn  from  the  i  primary  if  it  is  in  the  sample. 

The  n..  are  all  specified  apriori  for  i  =  In  this  paper  we  only 

consider  equal  probability  sampling  of  secondaries  without  replacement. 

If  we  denote  by  the  number  of  secondaries  having  scale  value 
in  the  i  sampled  primary,  then  the  joint  likelihood  of  the  and  the 
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n . .  is  given  by 


L(»  )  =  r(“1--u1)  n  [r  )/( *  )] 

L  1  G  1  it  i  l 

6.1.  Maximum  likelihood  estimation. 


We  confine  ourselves  here  to  the  case  of  =  ®,  i  =  1,..,L.  Hie 


likelihood  (39)  reduces  to 


L(Pn, •  •  ,PLT^)  =  ^[ll'u.niti’  ^  pit  ]  * 


Maximisation  of  (^0)  subject  to  £p..  =  1  for  i  =  1, ..,L  leads  to 


p  =  n  /n,  (primary  s  in  the  sample) 
St  St  ® 


while  arty  values  of  p_.^  are  permissible  for  j  not  in  the  sample.  The 
maximum  likelihood  solution  will,  therefore,  in  general  not  be  unique . 
Furthermore,  we  do  not  have  complete  sufficiency  here  and,  hence,  no  UMV 
estimator  exists.  We  have  not  considered  here  ’scale-load’  estimators  which 
do  not  depend  on  primary  labels. 

6.2.  Bayesian  estimation. 

Since  the  complete  likelihood  is  given  by  (39),  the  posterior  distri¬ 
bution  of  the  is  identical  to  that  in  the  case  of  stratified  sampling 
(section  k)  noting  that  n^  =  0  ia  allowed  for  the  latter.  Therefore,  the 
’Bayes  estimator’  of  Y  is  given  by  (25)  and  it  may  be  recast  as 


-  r- /  n.,  Dji  +  Xi.  .  -1 

E’(Y) .  -  r)i  *it +  r  5  *d 


n,  v 


(42) 
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vhere  and  Eg  respectively  the  summations  over  sampled  and  non- sampled 
primaries.  It  should  he  noted  that  we  must  have  a  prior  distribution  from 
each  primary.  If  the  prior  distribution  is  solely  based  on  pilot  samples, 
this  implies  that  the  pilot  sample  must  include  at  least  one  secondary  unit 
from  each  primary. 

The  above  analysis  clearly  shows  that  the  sampling  procedure  adopted 
for  selection  of  the  primaries  is  entirely  irrelevant  as  far  as  a  full 
Bayesian  analysis  is  concerned.  However,  if  the  likelihood  based  on  a 
selected  estimator  is  used  for  a  (partial)  Bayesian  analysis  based  on  in¬ 
sufficient  statistics,  then  the  posterior  distribution  and,  hence,  the 
’Bayes  estimator'  would  depend  on  the  sampling  procedure.  These  are  the 
two  alternatives  available  to  the  Bayesian  analyst  and,  at  this  stage,  we 
do  not  wish  to  take  sides  in  this  issue. 
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